Skip to content

Experimental: Unified LSP server in rewatch#8243

Draft
nojaf wants to merge 142 commits intorescript-lang:masterfrom
nojaf:rewatch-lsp
Draft

Experimental: Unified LSP server in rewatch#8243
nojaf wants to merge 142 commits intorescript-lang:masterfrom
nojaf:rewatch-lsp

Conversation

@nojaf
Copy link
Member

@nojaf nojaf commented Feb 9, 2026

This branch explores embedding a full LSP server directly into the rescript binary (rescript lsp), replacing the current architecture where a Node.js extension mediates between the editor and separate build/analysis processes.

The core idea

Today, the ReScript editor experience involves three processes: a Node.js VS Code extension, the rescript build watcher, and the rescript-editor-analysis.exe binary. They communicate through files on disk — the editor extension launches builds, waits for artifacts, then shells out to the analysis binary for each request.

This branch collapses the build system and LSP server into a single Rust process using tower-lsp. The build state lives in memory, and analysis requests shell out to the same rescript-editor-analysis.exe but with source code passed via stdin instead of being read from disk.

No temp files — stdin everywhere

Both bsc and the analysis binary receive source code via stdin rather than through temporary files. For didChange (unsaved edits), bsc -bs-read-stdin produces diagnostics without writing anything to disk. For analysis requests (hover, completion, code actions, etc.), the analysis binary receives a JSON blob on stdin containing the source text, cursor position, and package metadata. The OCaml analysis code was refactored with FromSource variants that parse from a string rather than opening files — so everything works correctly on unsaved editor buffers.

Separate build profile: lib/lsp

The LSP server writes its build artifacts to lib/lsp/ instead of lib/bs/. This means it doesn't conflict with rescript build or rescript build -w running in a terminal — both can operate independently on the same project without stepping on each other's artifacts.

Initial build: typecheck only

On initialized, the server runs a full build but only goes as far as producing .cmt/.cmi files (the TypecheckOnly profile). It deliberately skips JS emission. This gets the editor operational as fast as possible — type information for hover, completion, go-to-definition etc. is all available, without paying the cost of generating JavaScript for every module upfront.

Smart incremental builds on save

When a file is saved, the server runs a two-phase incremental build:

  1. Emit JS for the dependency closure — the server computes the transitive imports of the saved file and only emits JavaScript for that file and its dependencies. Modules outside this closure are skipped entirely. So saving a module produces JS for it and any imports that haven't been compiled yet — not the entire project.

  2. Typecheck reverse dependencies — modules that transitively depend on the saved file are re-typechecked to surface errors caused by API changes (e.g. a removed export). This gives you project-wide diagnostics on save — if you rename a function, you immediately see errors in every file that uses it, even files you don't have open. No JS is emitted for these — they get their JS when they are themselves saved.

What's implemented

All standard analysis endpoints are wired up: completion (with resolve), hover, signature help, go to definition, type definition, references, rename (with prepare), document symbols, code lens, inlay hints, semantic tokens, code actions, and formatting.

Observability

Every LSP request and build operation is traced with OpenTelemetry spans, viewable in Jaeger. This makes it straightforward to profile request latency and understand what the server is doing.

Test infrastructure

Each endpoint has integration tests using vscode-languageserver-protocol that boot a real LSP server in a sandbox, send requests, and snapshot both the results and the OTEL trace structure.

What's not here yet

  • workspace/didChangeWatchedFiles — handling external file changes (git checkout, etc.)
  • Multi-workspace / monorepo support
  • createInterface and openCompiled custom commands

This is an experiment to validate the architecture. If it proves useful, individual pieces can be split into focused PRs.

nojaf added 30 commits February 6, 2026 11:48
Add optional OTLP tracing export to rewatch, controlled by the
OTEL_EXPORTER_OTLP_ENDPOINT environment variable. When set, rewatch
exports spans via HTTP OTLP; when unset, tracing is a no-op.

Instrument key build system functions (initialize_build, incremental_build,
compile, parse, clean, format, packages) with tracing spans and attributes
such as module counts and package names.

Restructure main.rs to support telemetry lifecycle (init/flush/shutdown)
and fix show_progress to use >= LevelFilter::Info so -v/-vv don't
suppress progress messages. Also print 'Finished compilation' in
plain_output mode during watch full rebuilds.

Introduce a new Vitest-based test infrastructure in tests/rewatch_tests/
that replaces the bash integration tests. Tests spawn rewatch with an
OTLP endpoint pointing to an in-process HTTP receiver, collect spans,
and snapshot the resulting span tree for deterministic assertions.

Update CI, Makefile, and scripts/test.js to use the new test runner.
When stdin is a pipe (not a TTY), spawn a background thread that
monitors for EOF. This allows a parent process (such as the test
harness) to signal a graceful shutdown by closing stdin, without
relying on signals or lock file removal.
Add mtime and content-hash based deduplication to filter out phantom
and duplicate file system events. Normalize event kinds from atomic
writes (temp file + rename) so they are treated as content modifications
rather than create/remove cycles that trigger unnecessary full rebuilds.

This fixes issues on macOS (Create events from atomic writes), Linux
(duplicate inotify IN_MODIFY events), and Windows (Remove+Rename
sequences from atomic writes).
On Windows, bsc writes CRLF to stdout in text mode. When the original
source file uses LF line endings, the formatted output would introduce
unwanted CRLF conversions. Detect the original file's line ending style
and normalize the formatted output to match.
Propagate parent span through rayon in build.parse so build.parse_file
spans are properly nested under build.parse instead of appearing as
orphaned root spans.

Enrich build.compile_file span with package, suffix, module_system,
and namespace attributes for better observability.

Handle invalid config changes gracefully during watch mode: replace
.expect() with match to report the error and continue watching,
allowing the user to fix the config without restarting the watcher.
Add 7 new fixture packages to cover more configuration dimensions:
- commonjs: CommonJS module output with .bs.js suffix
- namespaced: namespace package with TestNS
- noop-ppx: lightweight cross-platform no-op PPX for testing
- with-deps: package depending on rescript-bun for clean tests
- with-dev-deps: multi-source dirs with dev dependencies
- with-jsx: JSX v4 with @rescript/react
- with-ppx: PPX integration using noop-ppx

Enhance test helpers:
- Normalize CRLF line endings in process output for Windows
- Support .bs.js artifacts in sandbox cleanup detection
- Add createCli, readFileInSandbox, writeFileInSandbox helpers
- Add OTEL config for build.parse_file and enriched compile_file spans
- Exclude noop-ppx from biome linting (CommonJS required)
Add tests for core build functionality:
- Build from a package subdirectory
- No stale artifacts on second build
- Filter flag to compile only matching modules

Add build error tests:
- Parse error reporting with file location
- Type error reporting
- Errors when a dependency module is deleted
- Circular dependency detection
Add module operation tests:
- File rename with and without dependents
- Duplicate module name detection
- Interface file compilation and error cases

Add namespace package tests:
- Build with namespace flag
- Namespace in compiler args
- File rename in namespaced package

Add dev-dependency tests:
- Dev source compiles with dev dependencies
- Non-dev source cannot use dev dependencies
- Clean removes dev source artifacts
Add build config tests:
- Experimental feature flags (valid, invalid key, invalid format)
- After-build hook execution (success and failure)
- Warning configuration in compiler args
- Warn-error CLI override
- Deprecated and unknown config field warnings

Add module system tests:
- CommonJS package with .bs.js suffix
- CommonJS in compiler args
- Suffix change triggers rebuild
- Duplicate package-spec suffix error

Add PPX integration tests using lightweight noop-ppx:
- PPX build produces output
- PPX flags in parser args
- PPX flags not in compiler args

Add JSX tests:
- JSX v4 build with @rescript/react
- JSX flags in parser args
- JSX preserve flag
Add tests for scoped clean, node_modules dependency cleanup,
and verifying no false compiler-update message after clean+rebuild.
Add format tests:
- Stdin formatting for .res and .resi
- Single file and all-files formatting
- Subdirectory-scoped formatting
- Check mode (pass and fail cases)

Add compiler-args tests:
- CWD invariance (same output from root and subdirectory)
- Warning flags in both parser and compiler args
Verify that a concurrent build is prevented while watch mode
holds the lock file.
Add watch mode tests:
- New file creation triggers compilation
- Warning persistence across incremental builds
- Config change triggers full rebuild
- Changes outside source dirs are ignored
- Missing source folder does not crash watcher
- Invalid config change recovery (watcher keeps running)
- File rename removes old artifacts and compiles new file
- File deletion removes artifacts
Tracing spans are thread-local, so compile_file spans created inside
Rayon's par_iter had no parent connection to the compile_wave span on
the main thread. Pass the wave span explicitly via `parent: &wave_span`
to establish the correct parent-child relationship.
When a file is saved in the LSP, only compile the saved file and its
transitive dependencies instead of every module in the project.

After the initial LSP build (TypecheckOnly), all modules sit at
CompilationStage::TypeChecked. A TypecheckAndEmit build targets Built,
so every module would enter the compile universe. In a large project
this means the first save compiles the entire codebase to JS.

Fix this by computing the downward dependency closure of the saved file
and temporarily promoting modules outside that closure to Built. After
the incremental build, promoted modules are restored to TypeChecked.
Modules already at Built from a previous save are left untouched.

Also change mark_file_parse_dirty to return Option<String> (the module
name) so did_save can identify the entry point for the closure walk.
Add single-file typecheck on unsaved edits (didChange). The unsaved
buffer
content is written to a temp file in the build directory and passed to
bsc
directly with TypecheckOnly. Diagnostics are remapped back to the
original
source path.

Refactor didSave into two phases:
- compile_dependencies (TypecheckAndEmit): compile the saved file and
  its
  transitive imports to produce JS output.
- typecheck_dependents (TypecheckOnly): re-typecheck modules that
  transitively import the saved file to surface errors from API changes,
  without emitting JS.

This means saving Library.res immediately shows type errors in App.res
without needing to save App.res first.

Other changes:
- Extract find_module_for_file helper on BuildCommandState
- Add get_dependent_closure (reverse dependency traversal)
- Use #[instrument] consistently for OTEL spans in the lsp/ folder
- Register new OTEL spans in test-context.mjs
Add an internal `-bs-read-stdin` flag to bsc that reads source from
stdin
instead of from the file argument. The filename argument is still
required
for error locations, file kind classification, and output prefix
derivation.

Update the LSP didChange handler to pipe unsaved buffer content directly
to
bsc's stdin instead of writing temporary files to disk. This eliminates
unnecessary filesystem I/O on every keystroke.

Key changes:
- compiler: add `Js_config.read_stdin` flag and `-bs-read-stdin` CLI
  option
- compiler: add `Res_io.read_stdin` and `Res_driver.parse_*_from_stdin`
- compiler: disable `binary_annotations` when reading from stdin (avoids
  Digest.file call on non-existent source file)
- rewatch: replace temp file write/cleanup in did_change.rs with stdin
  piping
Add a new `completion-rewatch` subcommand to the analysis binary that
receives all needed context (pathsForModule, opens, package config) via
JSON on stdin, bypassing the expensive project discovery that the
existing `completion` command performs.

Analysis binary changes:
- Add `CommandsRewatch.ml` with JSON parsing and package construction
- Add `CompletionFrontEnd.completionWithParserFromSource` to parse from
  a source string instead of reading from disk
- Add `Completions.getCompletionsFromSource` that takes source + package
- Add `Cmt.loadFullCmtWithPackage` that uses a pre-built package record
  instead of calling `Packages.getPackage`

Rust LSP changes:
- Track open buffers in `Backend.open_buffers` (updated on didChange)
- Enable completion_provider capability with trigger characters
- Add `lsp/completion.rs` that builds the JSON blob with all module/
  package context, spawns `rescript-editor-analysis.exe
  completion-rewatch`,
  and deserializes the LSP-conformant response
- If no .cmt exists yet (completion before any didChange), run a
  typecheck first to produce it

Test infrastructure:
- Add `completeFor` helper to lsp-client.mjs
- Add `lsp.completion` span to OTEL summary
- Add completion integration test
nojaf added 30 commits February 15, 2026 16:36
Use separate span names for compile and typecheck operations:
- build.compile / build.compile_wave / build.compile_file for full
  builds
- build.typecheck / build.typecheck_wave / build.typecheck_file for
  typecheck-only

Also add scope and output attributes to the incremental_build span,
and Display impls for OutputTarget and CompileMode.
hashes

CompilationStage now carries blake3 hashes at each stage:
Dirty → Parsed { source_hash, ast_hash }
→ TypeChecked { +cmi_hash, +cmt_hash }
→ Built { +cmj_hash }

This fixes the LSP performance issue where saving a single file caused
all modules to be recompiled. After the initial FullTypecheck build,
every module sat at TypeChecked. CompileDependencies with target Built
would mark all of them as originally_dirty since TypeChecked < Built.

Now CompileDependencies uses is_dirty() for originally_dirty, so only
modules whose source actually changed are marked dirty. The skip logic
checks needs_compile_for_mode() to determine if a non-dirty module
already reached the target stage. On the second save, all 444 unchanged
modules are skipped in sub-millisecond time.

Also fixes a pre-existing bug in mark_modules_with_expired_deps_dirty
where both sides of a tuple checked last_compiled_cmt instead of
last_compiled_cmi and last_compiled_cmt.
modules

Replace the flat Module struct + SourceType discriminator with a proper
enum: Module::SourceFile(SourceFileModule) and
Module::MlMap(MlMapModule).

This ensures compilation-stage fields (compilation_stage,
last_compiled_cmi,
last_compiled_cmt, is_type_dev, needs_dependencies_rescan) only exist on
source file modules, making it a compile error to access them on mlmap
modules. Removes the fake blake3::hash(b"mlmap") sentinels that were
previously stuffed into CompilationStage::Built to satisfy the type
system.

Accessor methods on Module (package_name, deps, dependents,
get_interface,
needs_compile_for_mode) provide clean access to shared fields without
requiring callers to match on the variant.

In parse.rs, when mlmap output changes, dependents are now directly
marked
dirty instead of setting a compilation_stage on the mlmap module itself.
The CompilationStage enum already tracks cmi_hash in its TypeChecked and
Built variants, making the separate last_compiled_cmi timestamp
redundant.

- Remove last_compiled_cmi from SourceFileModule
- Remove cmi_modules from CompileAssetsState and read_compile_state
- Simplify staleness check to use last_compiled_cmt.is_none()
Remove last_compiled_cmi and last_compiled_cmt from SourceFileModule.

last_compiled_cmi was fully redundant — CompilationStage already
tracks cmi_hash in its TypeChecked and Built variants.

last_compiled_cmt is moved into CompilationStage as a compiled_at
field on the TypeChecked and Built variants, co-locating the timestamp
with the stage where a .cmt is actually produced. This makes it
impossible for the stage and timestamp to get out of sync.

Also removes the now-unused cmi_modules field from CompileAssetsState
and its collection in read_compile_state.
compile errors

When a file (e.g. App.res) fails to compile because a dependency
(e.g. Button.res) doesn't expose the expected API, saving the
dependency with the fix should regenerate JS for both files. Previously,
the dependent only got typechecked with no JS output.

Root cause: `mark_modules_with_expired_deps_dirty` reset modules at
`Parsed` stage back to `Dirty`, losing their source/AST hashes. This
prevented `CompileError` from ever being set, so the LSP's third build
step (compile_resolved_errors) never fired.

Changes:
- Add `CompileError` variant to `CompilationStage` to track modules
  that failed compilation while preserving their parse hashes
- After the compile loop, mark failed modules as `CompileError`
- Skip `CompileError` and `Parsed` modules in
  `mark_modules_with_expired_deps_dirty` to preserve their stage
- Add step 3 (`compile_resolved_errors`) to the LSP save-build flow:
  snapshot `CompileError` modules before typechecking dependents, then
  fully compile any that moved to `TypeChecked`
- Always emit the `compile_resolved` OTEL span for observability
- Add two regression tests for stale JS scenarios
- Add `module_name` field to `SourceFileModule` so tracing includes
  which module is transitioning
- Make `compilation_stage` private; add `compilation_stage()` getter
  and `set_compilation_stage()` setter that logs every transition via
  `tracing::debug!` for OTEL observability
- Add `SourceFileModule::new()` constructor (always starts at Dirty)
- Add `CompilationStage::can_transition_to()` with `debug_assert!` in
  the setter to catch invalid state transitions during development
- Document every valid transition with the file/scenario that causes it
- Remove manual `tracing::debug!` calls at individual mutation sites
  (now handled centrally by the setter)
Tighten the CompilationStage state machine so transitions always follow
the full progression: Dirty → Parsed → TypeChecked → Built. Previously,
clean.rs and compile.rs would skip intermediate stages (e.g. Dirty →
Built, Parsed → Built), making it harder to reason about the state
machine.

Changes:
- clean.rs: check ast_is_fresh before restoring stages, preventing
  spurious Dirty → TypeChecked when source changed while process was
  down. Restructure artifact restoration to step through Parsed →
  TypeChecked → Built explicitly.
- compile.rs: always set TypeChecked before Built on successful
  FullCompile, rather than jumping directly to Built.
- build_types.rs: remove now-impossible transitions (Dirty →
  TypeChecked,
  Dirty → Built, Parsed → Built, CompileError → Built, Built → Built)
  from can_transition_to(). The debug_assert! in set_compilation_stage
  enforces these constraints in test builds.
CompilationStage with interface hashes

Replace per-file `parse_dirty: bool` flags on `Implementation` and
`Interface` with `CompilationStage` as the single source of truth for
whether a module needs parsing. Since `parse.rs` already re-parses both
files when either is dirty, the per-file granularity didn't save work
and was a source of subtle desync.

Enrich `Parsed`, `CompileError`, `TypeChecked`, and `Built` variants
with explicit implementation and interface hash fields
(`implementation_source_hash`, `implementation_ast_hash`,
`interface_source_hash`, `interface_ast_hash`) so the stage carries
complete per-file hash information through the pipeline.
Replace per-file iteration over .ast/.iast entries with per-module
iteration that skips .resi entries and looks up the matching .iast
from the .res entry. This eliminates the mirrored is_iast branching,
the package_path relative path resolution, and fixes a subtle bug
where only one file's freshness was checked before promoting a module.
Remove the ParseState and CompileState enums and their corresponding
fields (parse_state, compile_state) from Implementation and Interface.

CompilationStage now captures the full picture:
- New ParseError variant for explicit parse failures (previously
  indistinguishable from Dirty)
- has_parse_warnings bool on Parsed, CompileError, TypeChecked, Built
- has_compile_warnings bool on TypeChecked, Built

The compile_warnings: Option<String> field stays on Implementation
and Interface for incremental warning text re-emission.
Two related changes to make CompilationStage the single source of truth:

1. Drop Copy (and unused Clone) from CompilationStage, SourceFileModule,
   MlMapModule, and Module. Switch all methods from `self` to `&self`,
   return `&CompilationStage` from the getter.

2. Replace `has_compile_warnings: bool` on TypeChecked/Built with
   `compile_warnings: Option<String>`, and remove the duplicate
   `compile_warnings` field from Implementation and Interface.
   Warning text is now merged (impl + interface) at storage time
   and read back via `compilation_stage().compile_warnings()`.

Implementation and Interface are now just {path, last_modified} —
pure file metadata with no compilation state.
AST-deletion trick

Replace `has_parse_warnings: bool` with per-file
`implementation_parse_warnings: Option<String>` and
`interface_parse_warnings: Option<String>` on all CompilationStage
variants that carry it (Parsed, CompileError, TypeChecked, Built).

The actual warning text is now stored in the stage and replayed during
incremental builds, matching how compile warnings already work. This
eliminates the indirect AST-deletion mechanism in cleanup_after_build
that forced re-parse to reproduce warnings each cycle.

Keeping implementation and interface warnings separate preserves
per-file attribution for future LSP diagnostic improvements.
When --warn-error is passed to `rescript build`, the build system now
keeps all modules in the Dirty state instead of restoring them from
disk artifacts. This ensures the parse phase re-runs with the
overridden warning flags.

Previously, cleanup_previous_build would restore modules to
Parsed/Built from fresh disk artifacts regardless of CLI flag changes.
Since warnings like 110 (%todo) are emitted during parsing (not
compilation), the --warn-error flag had no effect on subsequent builds
because the parse phase was skipped entirely.

The LSP is unaffected — it always passes None for warn_error_override,
so the restoration logic runs normally.
SourceDirty

Split the old `Dirty` stage into two distinct stages:

- `SourceDirty`: source changed on disk, needs full pipeline (reparse +
  recompile)
- `DependencyDirty`: a dependency's interface changed but this module's
  AST is still valid — only needs recompilation, not reparsing

This fixes invalid state transitions in the LSP flow where modules whose
source hadn't changed were being marked `Dirty` (needing reparse) when
only a dependency changed. The `DependencyDirty` stage preserves parse
hashes so the compile loop can skip reparsing.

Also refactors `CompileUniverse` from a struct with implicit conventions
into an enum with two explicit variants:
- `AllNeedCompile`: every module compiles unconditionally
  (TypecheckDependents)
- `DirtyWithDependents`: only `needs_compile` modules are forced;
  transitive dependents can be skipped if their deps' .cmi didn't change
modules

Remove the `CompileScope` enum and monolithic `incremental_build()`
function.
Each build scenario now has its own module that constructs a
`CompileParams`
and calls `process_in_waves()` directly:

- `full_build.rs`: CLI/watcher full compile with progress bars and
  logging
- `full_typecheck.rs`: LSP initial/project build, typecheck only
- `compile_dependencies.rs`: LSP save step 1, compile dependency closure
- `typecheck_dependents.rs`: LSP save step 2, typecheck dependent
  closure

Introduce `CompileFilter` enum (`All` | `DirtyOnly(set)`) to replace the
`Option<AHashSet<String>>` that encoded "which modules need
compilation."

Net result: ~210 fewer lines of Rust, no indirection layer, and each
scenario is self-contained and independently readable.
Previously compile_dependencies used is_source_dirty() to populate the
dirty set, only matching SourceDirty modules. This was inconsistent with
full_build/full_typecheck which use needs_compile_for_mode(), matching
DependencyDirty, Parsed, and CompileError as well.

The old behavior was accidentally correct because the compile loop's
skip logic handled these cases, but it was subtle and fragile. Now all
four scenarios use the same filtering approach.
Add compile_mode field to CompileError variant so the LSP can
distinguish errors from user saves (FullCompile) vs dependent
typechecks (TypecheckOnly). When compile_resolved_errors runs
after typecheck_dependents, it only re-compiles modules whose
original error came from FullCompile — preventing stale JS for
files the user never saved.

Close the errored-modules hash extraction gap by explicitly
handling TypeChecked/Built stages and replacing the wildcard
match with SourceDirty|ParseError.

Rename abbreviated span names for consistency:
- compile_deps → compile_dependencies
- typecheck_deps → typecheck_dependents
- Remove dead methods from CompilationStage: has_parse_warnings,
  needs_compile, ordinal, cmi_hash, implementation_source_hash
- Fix typo: current_in_progres_modules → current_in_progress_modules
- Fix typo: checke → check
- Fix incorrect expect message: "stdout should be non-null" →
  "stderr should be valid utf-8"
- Update stale doc comments referencing removed TypecheckAndEmit
- Replace Duration::new(0.0 as u64, 0.0 as u32) with Duration::ZERO
- Remove commented-out dead code in compiler_args
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments